Skip to content

🌱 Migrate gc to aws sdk v2 #5518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Danil-Grigorev
Copy link
Member

@Danil-Grigorev Danil-Grigorev commented Jun 3, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Migrate ELB garbage collection to AWS SDK v2

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5406

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Migrate ELB garbage collection to AWS SDK v2

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/support Categorizes issue or PR as a support question. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 3, 2025
@k8s-ci-robot k8s-ci-robot added needs-priority needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 3, 2025
@Danil-Grigorev Danil-Grigorev force-pushed the migrate-gc-to-aws-sdk-v2 branch from 7b8080f to 1822ed1 Compare June 3, 2025 19:05
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2025
@Danil-Grigorev Danil-Grigorev force-pushed the migrate-gc-to-aws-sdk-v2 branch from 1822ed1 to adc451e Compare June 3, 2025 19:06
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2025
@Danil-Grigorev Danil-Grigorev force-pushed the migrate-gc-to-aws-sdk-v2 branch from a75be30 to b39b276 Compare June 4, 2025 06:31
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2025
@Danil-Grigorev Danil-Grigorev changed the title [WIP] Migrate gc to aws sdk v2 🌱 Migrate gc to aws sdk v2 Jun 4, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 4, 2025
@Danil-Grigorev Danil-Grigorev force-pushed the migrate-gc-to-aws-sdk-v2 branch 5 times, most recently from fccf39b to 12a5c0a Compare June 4, 2025 07:04
@Danil-Grigorev
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks
/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Jun 4, 2025
@Danil-Grigorev Danil-Grigorev force-pushed the migrate-gc-to-aws-sdk-v2 branch from 12a5c0a to 2cfd1e6 Compare June 4, 2025 08:28
@damdo
Copy link
Member

damdo commented Jun 4, 2025

/assign @nrb

@damdo
Copy link
Member

damdo commented Jul 8, 2025

/test pull-cluster-api-provider-aws-e2e-eks pull-cluster-api-provider-aws-e2e

@alexander-demicev
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 8, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2483a48a44516c7def3b52a424790a0e094e9d9b

@damdo
Copy link
Member

damdo commented Jul 8, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@nrb
Copy link
Contributor

nrb commented Jul 8, 2025

Looking at the failures, I see this looping in the logs:

☸ kind-capi-test in bootstrap/controllers/capa-controller-manager
❯ rg ignition | tail -n 2
capa-controller-manager-7d6df5dc8d-gkfxc/manager.log:I0708 19:47:44.178421       1 network.go:94] "Deleting network" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="functional-test-ignition-owi2r3/functional-test-ignition-ox98fe" namespace="functional-test-ignition-owi2r3" name="functional-test-ignition-ox98fe" reconcileID="31553a5f-193a-4113-af48-af17237031eb" cluster="functional-test-ignition-owi2r3/functional-test-ignition-ox98fe"
capa-controller-manager-7d6df5dc8d-gkfxc/manager.log:E0708 19:47:44.306543       1 controller.go:347] "Reconciler error" err="failed delete reconcile for gc service: collecting resources: describe tags of loadbalancers: operation error Elastic Load Balancing: DescribeTags, https response error StatusCode: 400, RequestID: bca961a7-309f-4298-ab5c-1e14611f3f74, LoadBalancerNotFound: Cannot find Load Balancer ab8c357d274834410a9701544c338898" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="functional-test-ignition-owi2r3/functional-test-ignition-ox98fe" namespace="functional-test-ignition-owi2r3" name="functional-test-ignition-ox98fe" reconcileID="31553a5f-193a-4113-af48-af17237031eb"

I'm not sure that that's the problem though.

I also thought that perhaps we're not passing in the right tags for the DescribeLoadBalancerInput, but looking at e5a2d41#diff-2ef81192d9a7109b58d85f27cf8a725c0c8f513aff894c2ef6abe722f132ef27L137 it appears that we didn't change anything, at least at that call site.

@damdo
Copy link
Member

damdo commented Jul 9, 2025

@Danil-Grigorev it looks like this EKS GC test failed, is it related with your changes?

@nrb
Copy link
Contributor

nrb commented Jul 9, 2025

/test pull-cluster-api-provider-aws-e2e

@damdo
Copy link
Member

damdo commented Jul 10, 2025

I see E0709 23:37:59.990774 1 controller.go:347] "Reconciler error" err="failed delete reconcile for gc service: collecting resources: describe tags of loadbalancers: operation error Elastic Load Balancing: DescribeTags, https response error StatusCode: 400, RequestID: 9d28cb3e-fc90-481b-8959-9fe80943e548, LoadBalancerNotFound: Cannot find Load Balancer aa577791471ab4a1c9a3f9061f21e46e" controller="awsmanagedcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="AWSManagedControlPlane" AWSManagedControlPlane="eks-extresgc-alterstrategy-75mq6u/eks-extresgc-alterstrategy-iunwv6-control-plane" namespace="eks-extresgc-alterstrategy-75mq6u" name="eks-extresgc-alterstrategy-iunwv6-control-plane" reconcileID="f84edcad-5f45-44e8-8913-c38d0a74b49b" in the EKS test https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5518/pull-cluster-api-provider-aws-e2e-eks/1943046900202606592/artifacts/clusters/bootstrap/controllers/capa-controller-manager/capa-controller-manager-5964db88d9-25d55/manager.log

cc. @Danil-Grigorev

Migrate the garbage collection service and related components to use the AWS SDK v2.
This involves updating dependencies, refactoring client initialization, and updating API calls within the GC logic.
Adds new v2 specific converters, filters, and mocks.

Migrate the garbage collection logic for AWS Application Load Balancers (ALBs) and Network Load Balancers (NLBs) to use the AWS SDK v2.

Migrate the garbage collection logic for Classic Load Balancers (ELB) from the AWS SDK v1 to v2.

Signed-off-by: Danil-Grigorev <[email protected]>
@Danil-Grigorev Danil-Grigorev force-pushed the migrate-gc-to-aws-sdk-v2 branch from eb0436a to ae57778 Compare July 16, 2025 07:11
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2025
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 16, 2025

@Danil-Grigorev: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-apidiff-main ae57778 link false /test pull-cluster-api-provider-aws-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Danil-Grigorev
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e-eks pull-cluster-api-provider-aws-e2e

@damdo damdo added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 16, 2025
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.
@punkwalker any concerns with merging this? It looks like tests are all passing now

@damdo
Copy link
Member

damdo commented Jul 16, 2025

/assign @punkwalker

@k8s-ci-robot
Copy link
Contributor

@damdo: GitHub didn't allow me to assign the following users: punkwalker.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @punkwalker

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 0695a7fb4a701bf287fff4fc821f4d0e86fb31cb

@nrb
Copy link
Contributor

nrb commented Jul 16, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2025
@nrb
Copy link
Contributor

nrb commented Jul 16, 2025

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2025
@k8s-ci-robot k8s-ci-robot merged commit 8d4c7f2 into kubernetes-sigs:main Jul 16, 2025
20 of 21 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v2.8 milestone Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/support Categorizes issue or PR as a support question. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate gc code to AWS SDK v2
9 participants